DISTRIBUTION OF ALIGNED LETTER PAIRS IN OPTIMAL 
ALIGNMENTS OF RANDOM SEQUENCES 



RAPHAEL HAUSER AND HEINRICH MATZINGER 

Abstract. Considering the optimal alignment of two i.i.d. random sequences of length 
n, we show that when the scoring function is chosen randomly, almost surely the empir- 
ical distribution of aligned letter pairs in all optimal alignments converges to a unique 
limiting distribution as n tends to infinity. This result is interesting because it helps un- 
derstanding the microscopic path structure of a special type of last passage percolation 
problem with correlated weights, an area of long-standing open problems. Character- 
izing the microscopic path structure yields furthermore a robust alternative to optimal 
alignment scores for testing the relatedness of genetic sequences. 



1. Introduction 

1.1. Basic Definitions and Overview. Let A denote a finite alphabet, and let us 
consider alignments with gaps of two strings x and y of equal length n consisting of 
letters from A. For each such alignment tt we may count the number of letter pairs of 
different types aligned with each other and divide this number by n. Collecting these 
ratios for each possible pair of letters results in what we call the empirical distribution 
vector and denote by p^{x^y). We remark that this results in the empirical probability 
distribution in the classical sense scaled by a factor r > 1 that is due to the presence of 
gaps. The classical distribution can of course be recovered by normalizing our notion of 
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distribution. The set of all such empirical distribution vectors for given strings x and y 
will be denoted by 

SET(x,y) := {p n (x,y) : n is an alignment with gaps of x and y} . 

Let us give an example to illustrate the concepts we just introduced: Take n = 4, consider the strings 
x = aabb and y = afaab, where the alphabet consists of two letters A = {a, b}, and let us look at a few 
alignments with gaps of x and y. First, let it be given by 
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a 


a 


b 




b 


y 


a 




b 


a 


b 



There is one a aligned with a and hence the coefficient p aa = 1/4. Two pairs of letters b are aligned 
with each other, so p^b = 2/4 = 0.5. One letter a from x is aligned with a gap, so that p a ® = 1/4. Here 
and elsewhere we use the symbol © for a gap. And finally, one letter o from y is aligned with a gap, so 
that p® a = 1/4. The empirical distribution vector of the alignment tt is now given by 

Pir{x,y) = {p aa ,P a b,Pa®,Pb a ,Pbb,Pbe,P®a,P&b) = (0.25,0,0.25,0,0.5,0,0.25,0). 
A second alignment v is given by 



y 

This time we find the following empirical distribution, 

Pv{x,y) = (p aa ,P a b,Pa®,Pba,Pbb,Pbe,P®a,P&b) = (0.25,0.25,0,0.25,0.25,0,0,0). 



Finally, consider the alignment \i given by 
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b 
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y 


a 


b 




a 


b 



The empirical distribution is given by 

P»{x,y) = {Pa a ,P a b,Pa®,Pb a ,Pbb,Pb&,P& a ,P&b) = (0.25,0.25,0,0,0.25,0.25,0.25,0). 



Note that the coefficients of the empirical distribution vector do not usually add up to one. For 
example, in the case of alignment /x, we get 

Paa +Pab +Pa& +Pba + Pbb + Pb& + P®a + P&b = 1-25 > 1, 

the reason being that we divided by the length n of the strings instead of the number of columns of 
the alignment. If the alignment were without gaps, the coefficients would add up to one and represent 
frequencies, but in general this is not the case. However, the coefficients of the empirical distribution 
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vector are proportional to the actual frequencies and hence may be thought of intuitively as representing 
these. 

Let us next consider two random strings X = X\ . . . X n and Y = Y± . . . Y n , where Xi 
abd Yi are i.i.d. random variables taking values in the alphabet A. Consider the set of all 
empirical distribution vectors that can be obtained from aligning X with Y and inserting 
the gaps in different places. We denote the convex hull of this set by 

SET n := conv (SET(X, Y)) . 

One of our main results, Theorem 12.11 will establish that 

lim d{SET n , SET) a = 0, 

where SET is a unique limiting set that only depends on the distribution of the sequences 
X and Y, but not their realization, and where d(-, •) denotes the Hausdorff distance 
between two subsets in M. n and is defined in (j!.12p below. 

Let A* denote the alphabet A augmented by the symbol (9, which stands for a gap, 
and consider functions S from A* x A* into the set of real numbers. Such functions will 
be called scoring functions. For a gapped alignment tc of x = x\ . . . x n and y — y\ . . . y n , 
let us define the score S 7T (x,y) under a scoring function S as the sum of scores of the 
aligned symbols pairs from A*. An alignment of x and y is called optimal under S if it 
maximizes S^Xjy) amongst all gapped alignments of x and y. 

Another main result of this paper, Theorem I3.2[ shows that when the scoring function 
S is chosen at random, the empirical distribution of any optimal alignment of the random 
strings X = X\ . . . X n and Y = Y\ . . . Y n with respect to S almost surely approaches a 
unique limiting vector p$ as n tends to infinity. Apart from the realization of S, this limit 
vector only depends on the distribution of X and Y, but not on their realizations. 

Let us further illustrate the concept of optimal alignment of two strings by means of an example. 
Consider a scoring function S that takes the value 1 for identically aligned letters, and otherwise. In 
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the context of the three examples of alignments introduced earlier, we find the following scores, 

S n (x, y) = S(a, a) + S(a, 0) + S(b, b) + 5(0, o) + S(b, b) = 1 + + 1 + + 1 = 3, 
S^x, y) = S(a, a) + S(a, b) + S(b, ©) + 5(0, o) + 5(b, b) = 1 + + + + 1 = 2, 
S v (x, y) = 5(a, a) + 5(o, b) + 5(b, a) + 5(b, b) = 1 + + + 1 = 2. 

One could check that the score of 3 is not exceeded for any alignment of x and y and hence find that 7r 
maximizes the alignment score of x and y under 5. We thus say that n is an optimal alignment under 5, 
although note that it may not be the unique alignment with this property. 

The alignment score can also be viewed as the value that an appropriately defined linear 
functional takes on the empirical distribution vector of an alignment: Let fs : M 8 — > R 
be defined by 

(1-1) f S (p) = fs (Paa,Pab,Pa&,Pba,Pbb,Pb&,P&a,P<Sb) 

= 5(0, a)p aa + 5(0, b)p ab + S{a, <£>)p a& + S{b, a)p ba + S{b, b)p bb 
+ S(b, (5) Pm + S(&, a) P&a + S{&, b)p etb . 

It is then the case that 

(1-2) S*(x, y) = nf s (p w (x, y)) 

holds for any alignment n of x and y. 

Recall that SET(x, y) is defined as the set of all empirical distribution vectors of align- 
ments of x and y, 

SET(x, y) := {p n (x, y) : n is an alignment of x and y} . 

In particular, in our current example SET(x ) y) = SET(abab, aabb) contains the vectors 

p^x, y) =(0.25, 0.25, 0, 0, 0.25, 0.25, 0.25, 0) 
p n ( Xj y ) =(0.25, 0, 0.25, 0, 0.5, 0, 0.25, 0) 
p u (x, y) =(0.25, 0.25, 0, 0.25, 0.25, 0, 0, 0). 

Further, we write 

(1.3) L s (x, y) := max S v (x, y) 
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for the optimal alignment score of x and y, where the maximum is taken over all gapped 
alignments n of x with y. Thus, in our current example we have Ls(abab, aabb) = 3. The 
rescaled maximum alignment score can also be seen as the maximum value taken by the 
functional f$ over SET(x,y), 

= max f s (p). 

n peSET(x,y) 

1.2. Motivation. 



1.2.1. First passage Percolation. The problem of understanding the structure of an op- 
timal path in first and last passage percolationhas was recognized as being important 
several decades ago but still remains largely unresolved, see Kesten [12]. Consider the set 
of edges 

E := {{(z,w), (z,w + 1)}, {(z,w), (z + l,w)} : z, w E Z} 

of the integer lattice. E thus consists of vertical and horizontal edges of unit length 
incident to points in M? with integer coordinates. Consider a setup in which a random 
weight w(e) is associated with each edge e e E. In the classical setting of First Passage 
Percolation, these random weights are i.i.d. distributed, and a path of smallest total weight 
between two points a and b is sought. Any admissible path must consist of consecutive 
adjacent edges e\, e 2 , ■ ■ ■ ,e n & E, and t\ and e n must be incident to a and b respectively. 
The weights can also be interpreted as the time it takes to cross an edge, with the total 
weight u>(ei) + wfa) + ■ ■ ■ + w(e n ) of the path corresponding to the passage time from 
a to b via the chosen path, and a minimum weight path corresponding to a fastest link 
between the two points. 

An example of an open problem relating to the microstructure of an optimal path 
in first passage percolation is the following (see Kesten |12j): Consider the two points 
a = (0,0) and b = (0,n). What is the proportion of vertical and horizontal edges in an 
shortest path from the point (0, 0) to (0, n), and does this proportion converge as n goes 
to infinity? We shall now argue that these questions are closely related to the central 
problem of this paper, which is to find the limiting empirical distribution of the aligned 
letter pairs. Consider the set of oriented edges 

E' := {((z, w), (z, w + 1)), ((z, w),(z + 1, w)) , ((z, w), (z + 1, w + 1)) : z, w G Z} , 
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let x — x± . . . x n and y = y\ . . . y n be strings of letters from the alphabet A, let a scoring 
function S : A* x A* — > K. be given, and let the weight w(e) of an edge of type e = 
((z,w), (z + l,z + 1)) (a diagonal edge) be equal to the score obtained by aligning the 
letter x z+ i with y w+ i, 

w(e) = S (x z+ i,y w+ i) . 

For edges of type e = ((z,w), (z + (horizontal edges), let the weight w(e) be given 

by the score of aligning a gap with the letter x z+ \, 

w(e) = S(x z+l ,<5). 

Likewise, for vertical edges e = ((z,w), (z,w + 1)), let 

w(e) = S (<S,y w+ i) . 

In this manner, the problem of aligning x with y in an optimally according to the scoring 
function S becomes a Last Passage Percolation problem. The optimal alignment score 
S(x,y) equals the weight of the maximum weight path going from (0,0) to (n,n). An 
optimal path eie 2 . . . e m , that is, a path of maximum total weight among those that follow 
oriented edges from E' and link (0,0) to (n,n), defines an optimal alignment of x with 
y in the following fashion: For any diagonal edge ((z,w), (z + l,w + 1)) that lies along 
the path, align the letter x z+ \ with y w+ \. align all other letters with gaps. Now note 
that when we know the limit of the empirical distribution vector of the aligned letter 
pairs, we also know the proportion of gaps on the long run. In other words, the limiting 
distribution of the aligned letter pairs yields the asymptotic proportion of horizontal and 
vertical edges in the optimal path in E' . This information is the equivalent to knowing 
the asymptotic proportion of vertical and horizontal edges in last passage percolation in 
a model where the distribution of edge weights is different, and where diagonal edges 
are present. The corresponding results for first passage percolation can be obtained by 
multiplying the edge weights by —1. 

1.2.2. Computational Genomics. In computational genomics the alignment score is a 
maximum likelihood ratio to decide which alignment is the most likely association of 
sequences that diverged by evolution. Any gapped alignment of two DNA or RNA strings 
x — x± . . . x n and y — y\ . . . y n represents a hypothesis about the evolutionary history of 
the two species relative to one another. Assume that the strings x and y are sections of 
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genetic sequences from two extant species with a common ancestor. Aligning Xi with yj 
corresponds to the hypothesis that both letters descended from a particular letter in the 
corresponding section in the ancestor's genome. A letter aligned with a gap may corre- 
spond to a letter in the ancestral genome that disappeared in one of the two descendants 
or a new letter that appeared through mutation. The value S(a, b) of the scoring function 
at a, b G A* is equal to the logarithm of the probability that a letter from the ancestral 
genome evolved into a letter a in one of the extant species and into a letter b in the other, 
assuming that letters mutate independently of their neighbors^. Naturally, the scoring 
function depends on how long ago the two species got separated in the evolutionary tree. 
Given more time since separation, the probability of mutation increases, and as a result 
the scoring function also will look different. 

In practice it can sometimes be difficult to determine whether two given sequences are 
related or not, since a good choice of the scoring function S may not be known a priory 
in absence of a good estimate of the time since evolutionary divergence between the two 
extant species. Sections of DNA-sequences might also look similar because they have 
a similar distribution of amino acids rather than being related by direct evolution from 
a particular section of the ancestral genome. An approach to overcoming this problem 
is to relate sequences by the microscopic path structure of optimal alignments. Hirmo, 
Lember and Matzinger [10] found that optimal alignments of related and non-related 
sequences have entirely different microscopic structures. Preliminary experiments showed 
that basing a relatedness test on path structure works at least as well as the widely used 
test based on the BLASTZ algorithm, which works with a sophisticated scoring function. 
The results of our paper will help to take this work further. 

1.3. Monte Carlo Simulation. An appealing approach to the investigation of asymp- 
totic qualities of optimal alignments of random sequences is to use Monte Carlo simulation. 
However, to derive rigorous bounds one usually needs to know the variance VAR(L n (S')) 
of the alignment score as a function of the length n of the aligned random sequences 
Xi . . . X n and Y± . . . Y n , and this order of fluctuation is not yet well understood. In the 
special case of the longest common subsequence problem with sequences consisting of i.i.d. 



This is of course an inexact approximation of true mutation dynamics. 
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Ber(p) variables, Chvatal and Sankoff [6] conjectured that VXR(L n (S)) = o(n 2 / 3 ) when 
p = 0.5. Steele [19] later proved the bound VAR(L n (S)) < 2p(l — p)n. Waterman [21] 
asked the question of whether this bound can be improved and found that for p < 0.5, 
simulations suggest that the dependence of VXR(L n (S)) on n is linear. Boutet de Monvel 
[1] found that this also applies to the case p = 0.5, although the linear growth only sets 
in for very large n. Lember and Matzinger [H] gave a rigorous proof of the linear order 
VAR(L n (S')) = Q(n) in the case where p is very small. Their analysis was based on 
showing that the manipulation of randomly selecting a letter of specified type from one of 
the two sequences and changing it into another specified type has a positive biased effect 
on the optimal alignment score. In Section HI we significantly extend the applicability of 
this result to general scoring functions and random sequences whose distributions are not 
highly asymmetric: Theorem 14.21 yields a sufficient criterion under which the asymptotic 
order of fluctuation VAR(L n (S 1 )) = 0(n) holds. 



1.4. Summary of Main Results. We now amend the notation introduced in (11 .3p and 

write 

(1.4) L n (S) = max S W (X,Y) 

for the optimal alignment score of the two i.i.d. strings X = Xi,X 2 , . . . ,X n and Y = 
Yx, Y 2 , . . . Y n with letters from the alphabet A. Since X and Y are random, the maximum 
score L n (S) is a random variable, and we are interested in its dependence on the scoring 
function S : A* x A* — > R in the asymptotic regime where the length n of the strings 
tends to infinity. Let X n (S) denote the rescaled expected alignment score, 

(1.5) K(S) := Ml. 

n 

A simple subadditivity argument, see Chvatal & Sankoff [6], shows that 

(1.6) X n (S) < X m (S), Vn < m E N, 

(1.7) \{S) := lim E[Ln( ' g)] exists, 



(1.8) 



n— !>oo Tl 



1. 
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Furthermore, Alexander pQ showed that in the case of the longest common subsequence 
problem, the convergence is slower than the order \/lnn/ y/n, 



X(S)-X n (S)>C 




In Lemma 16.21 we show that a lower bound of the same order also exists. The exact 
value of the Chvatal-Sankoff constant X(S) is unknown even in the simplest cases, and 
Montecarlo simulations to obtain estimates of even moderate accuracy are quite involved 
[8j [9]. Note that X(S) depends of course on the distribution of the random strings. 

Let H(S) designate the half space 

(1.9) H(S):={xER^ 2 : f s (x)<X(S)}, 
where fs is defined by 

fa : Ml- 4 *! 2 -> R, 

x h> 2J ^ S(a, b)x ab . 

a&A* beA* 

We also consider 

(1.10) SET := f\ s H(3), 

where the intersection is taken over all scoring functions S. It is immediate from its 
definition that SET is a closed convex set, and in Lemma [5.21 we will furthermore show 
that it is compact with nonempty interior. Recall also the notation 

SET(X, Y) := {pwpf, Y) : tt is an alignment with gaps of X and Y} C R lA ^ 2 

introduced earlier for the set of empirical distribution vectors for gapped alignments of X 
and Y. Since X and Y are random strings, SET(X, Y) is a random set. We denote the 
convex hull of this set by 

(1.11) SET n := conv (SET(X, Y)) , 

where we account for the length n of the random strings X and Y notationally because 
we are interested in the asymptotic behavior when n tends to infinity. 



10 



R.A. HAUSER AND H.F. MATZINGER 



The Hausdorff distance [7] between two sets A, B C W 1 is denned as follows, where || • || 
denotes the Euclidean norm, 



Let us now discuss the main results of this paper: 

(1) Theorem 12. II will establish that the random set SET 11 almost surely converges to 
the deterministic set SET in terms of Hausdorff distance. As a consequence, if one 
were to simulate the sequences X± . . . X n and Y\. . .Y n and compute the convex 
hull of the empirical distribution vectors of all their alignments with gaps, one 
would find a set that closely resembles SET, provided n is large enough. 

(2) Theorem 13.11 will show that the empirical distributions of all optimal alignments 
of X and Y almost surely converge to a deterministic distribution as n tends to 
infinity, on condition that the scoring function S be chosen such that f$ has a 
unique maximizer in SET. When this condition is met, we denote the unique 
maximiser by ps- The statement of the theorem then says that the probability 
that there exists an optimal alignment of X and Y with respect to S with empirical 
distribution further away than e > from p$ is negatively exponentially small in 
n, where e is an arbitrary small constant independent of n. 

(3) The condition of Theorem 13.11 is difficult to verify in practice, but Theorem 15.11 
shows that when the scoring function S is chosen randomly, then the condition is 
almost surely met, that is, f$ has a unique maximizer in SET with probability 
1. As a corollary, we obtain Theorem I3.2[ which says that for almost all scoring 
function S the empirical distributions of all optimal alignments of X and Y almost 
surely converge to a deterministic distribution. 

(4) Theorem 13.21 allows for the derivation of a sufficient criterion to guarantee that 
the order of fluctuation of the optimal alignment score is linear in the length n 
of the aligned random strings Xi . . . X n and Y\ . . . Y n . The sufficient criterion 
depends on the scoring function S and on the unique maximizer p$ of fg over 



(1.12) 




d(x,B) 
d(A,y) 



inf {||ac-y|| : y G B} , 
inf {\\x — y\\ : x G A} . 
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SET and constitutes a practical tool in the design of a statistical test on the order 
of fluctuation of the optimal score, see [2]. 



1.5. A Few Key Ideas. Let S : A* x A* — > R be a scoring function, x and y two strings 
of length n with letters from A, and tc a gapped alignment of x and y. By ( 11.21) . the 
optimal alignment score of x and ?/ with respect to S satisfies 

max = max jsip)- 

Applied to the random strings X = X\ . . . X n and Y = Yj_ . . . Y n , we find 

(1.13) LJ^1= max /5 ($. 

A crucial observation is now that the maximum of a linear function over a closed set in 
M n equals the maximum of the given function over the convex hull of the given set. Since 
SET n = conv (SET(X,Y)), flL~T3D implies 

(1.14) ^= max f s (p). 

n peSET™ 

By (11.81) . L n (S)/n almost surely converges to a deterministic constant X(S) which was 
also used to define SET, see (11.91) and (ll.lOp . The definition of SET shows further that 

(1.15) maxf s (p)<X(S), 

p£SEl 

and Lemma 15.21 d) , proven below, shows that the inequality in (11.15|) holds in fact at 
equality. Combined with (11.141) . this implies 

(1.16) max fs(p) n —$ max fs(p) almost surely. 

V ' pGSET^ J K ^ J pGSET J K ^ J J 

At a first pass it is illustrative to consider an approximate proof of Theorem 13.21 that 
is free of technical details relating to large deviations that will be necessary to render 
the proof rigorous. Proposition 15.21 and Theorems 15.41 and I5.1[ which will be proven 
in Section [5j provide the crucial insight: by (11.16p . the conditions of Theorem 15.41 are 
approximately met for C = SET and C n = SET n , n e N, and hence, it is plausible to 
argue that SET n — > SET. We remark that in rendering this argument rigorous later, 
we will use the fact that L n (S)/n converges to X(S) at a rate on the order of \n{n)/ y/n, 
which follows directly from the Azuma-Hoeffding Inequality, as we shall see in Section |6j 
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The convergence of SET n to SET occurs thus at the same rate. Choosing the scoring 
function S at random is tantamount to choosing fs randomly, whence Theorem 15.11 shows 
that the conditions of Proposition 15.21 are satisfied. Theorem 13.21 thus follows. 

1.6. Some Key Difficulties. Consider the optimal alignment score L n (S) for some scor- 
ing function S. L n (S) is a random variable, because it depends on the realization of the 
i.i.d. random strings X = X\ . . . X n and Y = Y\ . . . Y n . However, it is easy to see that 
changing the realization of only one of these variables results in a change of L n (S) by at 
most the deterministic constant 

max \S(c, d) — S(c, e)\. 

See Lemma 16.11 for details. One can therefore apply the Azuma-Hoeffding Inequality to 
find that, on a scale of -y/n, the tail of L n (S) decays at least quadratically exponentially 
fast, see Lemma [6731 This powerful tool lends itself to an elegant analysis of the asymptotic 
convergence of the alignment score and its fluctuation. 

In contrast, analyzing the convergence of the empirical distribution of letter pairs in 
optimal alignments is much harder: upon changing the realization of one of the ran- 
dom letters, it has to be assumed a priori that the entire optimal alignment and hence 
the relative frequencies at which alignments occur have changed. As a consequence, the 
Azuma-Hoeffding Inequality cannot be applied directly. Luckily, it can be applied indi- 
rectly through the optimal alignment scores of different scoring functions at the cost of 
having to deal with additional technicalities. 

A further key difficulty is that for the scoring functions S under consideration, it is 
required that fs be maximized in only one point on SET. This condition would be met if 
SET was known to be strictly convex everywhere, but this seems very difficult to verify in 
practice, since the exact shape of SET is unknown: SET corresponds to the asymptotic 
shape of the wet zone in the first/last passage percolation formualtion of our problem, and 
determining the shape of the corresponding zone in standard first passage percolation is a 
long-standing open problem in the general case. We get around this problem by showing 
that if the scoring function 5* is chosen at random, then with probability one there exists 
a unique maximizer of fs on on SET, see Theorem 15.11 
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We remark that convergence of the empirical distribution of aligned letter pairs may not hold when S is 
not chosen randomly. For a counter-example, construct a scoring function that takes the value 1 for pairs 
of identical letters, otherwise, and take A = {0, 1} and Xi, Yj i.i.d. Bernoulli Ber(l/2) variables. The 
optimal alignment score then corresponds to the length of the longest common subsequence (LCS), and 
the asymptotic empirical distribution of optimally aligned letters is not unique: Write out the optimally 
aligned sequences with the introduced gaps and subdivide them into sections of length 3, e.g., 
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One observes empirically that a positive proportion of triplets is of the form 

01® 001 010 nr 100 
01O'lO0'Ol© ui 0Ol' 

The first two correspond to the pattern o l in X being aligned with the pattern l o in Y, and the last 
two to the inverted situation. Thus, the first triplet can be exchanged for the second, and the third for 
the fourth without affecting L n (S) nor the optimality of the alignment. The empirical distribution of 
optimally aligned letter pairs changes however, as weight is shifted from the pairing (1, 1) to the pairing 
(0,0). 



2. Set Convergence of Empirical Distributions 



Theorem 2.1. Let SET and SET 11 be the sets defined in (QUI) and OB . and let d 
denote the Hausdorff distance defined in (11.121) . Then 



(2-1) 



d(SET n , SET) ™ 



Proof. By the definition of d in (11.121) . we need to prove the two identities 

(2.2) P ^niffl d(x, SET) ™ 0=1, 

(2.3) P max d(x, SET n ) '--:> I. 

x&SET 



To prove Equation (12.21) . we use Lemma [5. II which establishes that, given e > 0, there 
exist finitely many scoring functions Si, . . . , Sk such that 

max d (x, SET) < e, 
xeft =1 H(Si) 
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where the half-spaces 



H(Si) := {fe 



I- 4 *! 2 : f Si (x)<X(S i ) 

are defined as in Equation (II. 9p . Let H+(Si) denote the shifted half-space 

ln(n) 



H+ (Si) = jf : f Si (x) < X n (Si) + 

and let us define the event 

< {Si) = {ueQ: SET n (u) C H+ (Si)} , 

where Q is the probability space over which the random sequences X and Y are defined. 
Corollary 16.11 and its proof show that the Azuma-Hoeffding Inequality implies 

PK(Si)1 <n~ cs ^ n , 
where eg- > is a constant that does not depend on n, see Theorem 16.11 It follows that 



SET n cf\H+(Si 



i=l 



> i-j2 n ~ CSilnn > i- 



n 



— clnn 



where c > is a constant independent of n. The series Yl n n clnn De i n g convergent, the 
Borel-Cantelli Lemma implies that almost surely there exists no g N such that 

SET n c n* =l H+(Si), Vn>% 

By the definition of the Si, this implies that for all n > no, we have 



In tl 

max SET) <Cx — 

X&SET" ,/n 



(2-4) 



where C > is a constant independent of n. This implies that 



lim sup max d (x, SET) < e 

n ^oo xeSET" 



Ve > 0. 



Finally, since this is true for all e rational, Equation (12. 2 ft follows. 

To prove Equation (12.31) . we employ Theorem 15.31 that establishes that for any given 
e > 0, there exist points x*i, . . . , x*k G SET S e, a certain subset^ of the set of extreme points 
of SET, and chosen such that 

d (x, conv (Si, . . . , affc)) < e, WxESET. 



2 See Section [5] for the relevant theory. 
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We now claim that for each Xi there almost surely exists a sequence of points xi t i, x 2 ,j, £3^, . . 
such that Xjj G SET^ for all j G N and 



limsup \\xj t i — Xi\\ < e. 

By the triangular inequality, our claim implies that almost surely it is the case that 

limsup max d(x, SET n ) < 2e, Ve > 0, 

n ^oo xdSET 

and since this is true for all e rational, Equation (I2.3P follows. 

It remains to prove our claim. Proposition 15.11 shows that there exist scoring functions 
Sg. and Si, . . . , Si such that 



Xi G a := jf : f Sgi (x) > X (S Si ) , f Sj (x) < X (Sj) , (j = 1, . . . , i) j C B £ (£) 

and C^i compact for all n G N, where 

f In t? In t? 

:= <| x : / Sft (a?) > X n (S Si ) - — , / s , Or) < A (Sj) + {j = 1, . . . ,i 



Further, by ( II. 6p . the sets C n ^ are nested, and by compactness and (jl.7p . we have 
(2.5) limsup d(xi,C n>i ) < e. 

n— >oo 

We will now show that with high probability C n> j has a nonempty intersection with 
SET n . Consider the events 



n 



^ . , . In n 1 



In T? 

^:=|wefi:3fe SET n s.t. / Sa . (£) > A n (%) - 



By Theorem 16. 1| we have 

P [^J < n~ K ^ n 

where Kj > is a constant that does not depend on n. Note also that Equation (I1.14p 
implies 

I In T? I 

loo en: f s . (x) < X (S,) + -j=, Vf G S£T n j . 
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In conjunction with (II. 6ft . Theorem 16.11 further implies 



p K,J < 



n 



-ln(n) 



But note that when the events ^ nj j and & n ,i, ■ ■ ■ &n,e occur jointly, then SET n D C n ^ ^ 
holds. The probability that the intersection is empty is thus bounded from above by 

t 

i=l 

where K > is a constant that does not depend on n. 
In view of the fact that the series 



m ~ Klnn 



n=l 

converges, the Borel-Cantelli Lemma now implies that, almost surely, for all but a finite 
number of n G N there exists x n j G SET n fl C n) j. In the finitely many cases where 
SET n PI C n j = we can pick an arbitrary point a; nj j G SET n to complete the sequence. 
In view of (12. 5p . we thus find that almost surely it is possible to construct a sequence 
(^n,i)neN with the claimed properties. Hence, this settles the theorem. □ 



3. Point Convergence 



So far we established that the empirical distributions of optimal alignments of random 
sequences under any scoring function asymptotically lie in SET. We will now show that 
for a fixed, randomly chosen scoring function S, the empirical distributions of all optimal 
alignments of X and Y under S converge to a unique point in SET. Recall the notation 
p n (x,y) introduced in Section HTT1 and let us write 

SET*(X, Y) = {p n (X, Y) : 7r is an optimal alignment of X and Y} 

for the set of empirical distributions corresponding to optimal alignments of X = X\ . . . X n 
and Y — Y\ . . . Y n . Consider the event 

@ n (p, e) := {a; G Vt : SET* (X(oo), Y(u)) \ B £ (p) ^ 0} 

that there exists an optimal alignment 7r of x = X{oS) and y = Y(u) under the scoring 
function S such that \\p n (x,y) — p\\ > e. 
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Theorem 3.1. Let S be a scoring function such that the hyperplane 
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(3.1) {x : f s (x) = \(S)} 

intersects SET in a unique point ps, and let e > be given. Then there exists a constant 
K e such that for all n G N it is true that 

P[%{p s ,e)]<e- K * n . 

Furthermore, SET*(X,Y) — > {ps} almost surely as n tends to infinity. 

Proof. By Lemma [5721 SET is a compact convex set with nonempty intersection with the 
hyperplane (13. 1H . and by fll.l5j> all such intersection points are maximizers of the optimiza- 
tion problem maXjj e sET{s,y), where s is the normalization of the vector representation of 
the linear functional fs defined by the scoring function. It follows that ps satisfies Defini- 
tion [5J] of a point of strict curvature of SET . Proposition 15. II therefore implies that there 
exist finitely many scoring functions Si, S 2 , ■ ■ ■ , S k and thresholds e , . . . ,e k > such that 

k 

(3.2) {x : f s (f) > X(S) - e } n f| {x : f s . (x) < X (5,) + e,} C B e (p s ) , 

i=i 

Consider now the events 

S n>i := {u e n : SET 1 C {x : f Si (x) < A (S,) + e t }} . 

By (11.141) this is equivalent to requiring that the rescaled optimal alignment score L n (Si) /n 
satisfy L n {Sj)/n < X(Si) + e^. By Theorem 16. II there exists Ki > such that 

(3.3) P [S n ^ > 1 - e~ Kin Vn. 
Let us further define the event 

^ n ,o := {u e Q : SET" fl {x : f s . (x) > X(S) - e } + 0} , 

which is the same as requiring that L n (S)/n exceed the value X(S) — eo- Corollary 16.11 
once again shows that there exists Kq > such that 

(3.4) P [Co] > 1 - e~ Kon Vn. 
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Combining all of the above, we now find ^ C (Ji= &n,h so that 

k k 

P [%} < fe] < J2*~ Kin < e- K ' n 

i=0 i=0 

for some constant K e > 0, as claimed. 

The last statement follows from the Borel-Cantelli Lemma in a similar construction as 
in the proof of Theorem 12. 1[ □ 

The above theorem shows that if ps is the only solution to fs(p) = As, then, denoting 
any optimal alignment of X\ . . . X n and Y\ . . . Y n with respect to S by n n , it is true a.s. 
that p nn (Xi . . . X n , Y\ . . . Y n ) — > ps- Note however that the convergence rate was not 
specified. Our convergence argument, which is based on the Azuma-Hoeffding Inequality 
- see Theorem 16.11 - could be made quantitative if a bound on the curvature of SET at 
Ps were known. 

Our second and main result of this section shows that the above theorem generically 
applies. For this purpose we consider a scoring function S that is chosen randomly 
in such a way that if S denotes the normalization of the vector representation of the 
linear functional fs, then S^has an absolutely continuous distribution with respect to the 
Hausdorff measure (or uniform measure) on the sphere. In this case we say that 5* has 
absolutely continous distribution. 

Theorem 3.2. Let the scoring function S be chosen randomly from an absolutely con- 
tinuous distribution, and let ir n denote any optimal alignment of X\ . . . X n with Y± . . . Y n . 
Then almost surely Ptt{X, Y) converges to a unique empirical distribution. 

Proof. By Theorem 15. 1[ the conditions of Theorem 13. II apply with probability 1. □ 

4. Fluctuation of the Optimal Alignment Score 

Let X — X\ . . . X n and Y — Y\ . . . Y n be the random strings introduced earlier, let a 
and b be two distinct letters from the alphabet A, and let us define a new random string 
X = X 1 . . . X n via the following compound procedure: 
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(1) sample a realization x = x\ . . . x n of X, 

(2) if J := {i : Xi = a} ^ 0, 

(a) let J be a random index defined on some probability space (f2, P) and taking 
values with uniform distribution on J , 

(b) select a sample j = J(£j), 

(c) set Xj = b and x^ = x,- t for % ^ j, 

(3) else set x — x. 

Note that the distribution of X generally differs from the distribution of X, and that, 
while X — X\ . . . X n consists of the first n letters of a random sequence the same 

cannot be said about X: we only ever sample (at most) one entry of X realized in the 
form of an a, independently of n, so that the probability of any given index to be chosen 
diminishes as n grows: 

The following result was proven by Lember and Matzinger [13], where we use the 
notation 

L n (S) := max S n (X,Y), 

7T 

in analogy to the earlier introduced random variable L n (S) = max T S n (X, Y), and where 
we write f(n) = O(n) if there exist constants < c\ < c<i such that c\n < f(n) < C2?i for 
all nGN. 

Theorem 4.1. Let the scoring function S and the distribution of X and Y be chosen so 
that there exist parameters (3,e > for which 



Ep 



> e 



>l-e" /3n , WneN. 



L n (S)-L n (S) \\X,Y 
Then the order of fluctuation of the optimal alignment score is given by 

VAR[L n (S)] = e(n). 



Up until now, the criterion of Theorem 14.11 could only be verified in a few special cases. 
We will next see that when the scoring function satisfies the conditions of Theorem I3.1[ 
then the criterion of Theorem 14.11 can be reduced to a condition that solely depends on 
Ps and that can be verified by Montecarlo simulation to high confidence: 
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Theorem 4.2. Let S is such that {x : fs(%) = A(S')} intersects S in a unique point 
Ps = (pa) an d such that there exist a,b E A for which it is the case that 



ceA* 

Then the order of fluctuation of the optimal alignment score is given by 

VAR [L n (S)] = Q(n). 

Proof. Let J — {% G {1, . . . , n} : Xj = a}, q a = P[Xi = a], and let us define the event 

Since X has i.i.d. entries, McDiarmid's Inequality - see Lemma [6.31 below - implies that 
for all fiGN, 

(4.1) P[jg > l-e~ n #. 
Next, let 

£ '■= ~r~ (ps, (Sbc - s ac ) c ) : = — V" Pac (s bc - s ac ) , 

where q a = P[X\ = a]. By continuity of inner products, there exists a 5 > so that for 
any p G Bs(ps), we have 

(4.2) -L (p, (S bc - S ac ) c ) > e. 

Recall now the notations SET*(X, Y) and @ n (p, e) introduced in Section [3j Theorem 13. II 
shows that there exists K$ > such that the probability that all optimal alignments of 
X and Y have empirical distributions that lie within a distance S of p$ equals 

(4.3) P [SET*(X, Y) C B, (p s )} = P (p s , 8)} > 1 - e~ K * n , Vn G N. 



But when @>°(ps, 6) occurs, then for any optimal alignment 7r* of X and Y", (14. 2 ft holds 
with p = p n *(X, Y). Denoting the components of p^*(X, Y) by p* , where (c, d) are pairs 
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of letters from A*, we have 



Ep 
> P 



L n {S)-L n {S)\\X,Y 



> e 



E P \S K (X,Y)-S K (X,Y) \\X,Y 



> £ 



(4.4) 

Therefore, 



> P 



|c/ 1 ce.4* 
~2nqa 
AJ\ 



e > e 



E P 
> P 



L n (S)-L n {3) \\X,Y 



> e 



E, 



L n {S)-L n {S) \\X,Y 



> 6 



> 1 - e _n ^ - e^", VnGN. 

Thus, the conditions of Theorem 14.11 are met for > small enough, and the claimed 
order of fluctuation holds. □ 



5. Appendix: Convex Geometry 

We will now present geometric results required in the analysis of earlier parts of this 
paper. S n_1 will denote the unit sphere in R n , B p (x) the Euclidean ball of radius p around 
x G M n , d the Hausdorff distance, conv(-) the convex hull and cl(-) the closure of a set 
in the canonical subspace topology inherited from R™. We say that a convex set C C M n 
has dimension k if its affine hull aff(C) C M. n has dimension k. 

Theorem 5.1. Let C C lR n be nonempty compact convex, and let S : Q — >■ S n_1 be a 
random vector that takes values in the unit sphere with uniform distribution, defined on 
some probability space (p,,&/, P). Then for almost all u G £1, the optimization problem 
argmaxy- g c(5 , (a;), y) has a unique solution. 

Proof. Let us first consider the case where C has nonempty interior. Upon a shift of C 
we may assume without loss of generality that lies in the interior of C. Then the polar 
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Figure 1. The geometry of the Lipschitz estimate. 

of C, 

C° = {weR n : (w, y) < 1, Vy G C} , 

is also compact convex with nonempty interior. Seen as the claim of the theorem is 
invariant under positive scaling, we may further assume without loss of generality that 

B 3 (0) C C° C B e (0). 
Next, let sE S n_1 be a given point on the unit sphere and consider the function 

w I—)- max {t > : tw G C°} 

defined on the tangent space at s. We claim that rg is Lipschitz continuous on a sufficiently 
small neighbourhood % of s in T.-S™" 1 n B 2 (0). Let w x ,w 2 G T.-S' 1 " 1 n B 2 (0) and W = 
span{wi, w 2 }- For (i = 1,2) we then have 

(5.1) 1 < \\wi\\ < 2, 

(5.2) T g (Wi) = max {r > : TWi E C° D W} , 

(5.3) 1 < Tg(Wi) \\Wi\\ < Q. 

By (15. 2ft . we may assume without loss of generality that W 1 = W for the purposes of 
proving \rg(wi)—Tg(w2)\ < L||wi — w 2 \\ ■ We refer the reader to Figured] for an illustration 
of the geometric setup. The lines a and b are the tangents from Tg(wi)wi to the unit sphere 
S 1 in W. Denote the angle between the line W1W2 and the horizontal at Wi by 9, the angle 
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Figure 2. Bounding t${w2) by ratios. 

between the horizontal at Tg(wi)Wi and the tangents a, b by a, and the angle between the 
horizontal at W\ and the two tangents from w\ to S 1 by (3. Since the affine hull aff (wi, W2) 
cannot enter Bi(0), it must lie wedged between the latter two tangents. In combination 
with (15. ip . this implies 

71 1 71 1 

(5.4) \9\ < 8 = arcsin - — - < arcsin -. 

y J 1 1 ~ 2 ||wi|| ~ 2 2 

Further, (I5.3P implies 

K\ 71 1 7T . 1 

(5.5) a = arcsin — < arcsin -. 

2 T3{W\) \\WxW 2 Q 

Observe that, by convexity, the line segment between the point of tangency of a at S 1 
and Tg(wi)wi lies in C°, and further that the definition of Tg(wi) implies Tg(wi)wi G dC°. 
Therefore, the segment of a above Tg(wi)wi lies outside C°, and it follows that 

/ x \\ B \\ / —* \ WM 

(5-6) ^ < r 3 (w 2 ) < 



\C\\ ~ " x " ~ ||D||' 

see Figure [2j Let ip be the angle between W\ and w 2 , and let us assume ip < (ir — 29) /2, 
so that the intersection points A, B, C, D exist. This assumption is equivalent to limiting 
our analysis to a sufficiently small neighbourhood of s in T^-S™ -1 , as assumed earlier. We 
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can now express the inequalties (15.61) in terms of the angles we introduced, 

1 — tan (p tan . 1 + tan ip tan /3 

T s(wi) T— : : < t 3 {w 2 ) < Tg {wi) - . 

1 + tan (p tan a 1 — tan ip tan a 

This can be simplified by Taylor expansion, 

|Ts {$2) — i~g (wi) I < Tg (wi) tan ip (tan a + tan j3) 
(5.7) < gtantp ( q\J 1 — g~ 2 + v3 j , 

and since ||wi — w 2 1| > ||wi|| tany?, Equations (15. ip and (15. 7p imply 

k?(wi) - Tg(w 2 )\ < L \\w x -w 2 \\ , 



with L = g(gy/l — g~ 2 + a/3). 

Next, having shown that Tg is Lipschitz continuous on a sufficiently small open neigh- 
bourhood Yg C T^S n_1 of s, Rademacher's Theorem [TS] implies that Tg is Frechet- 
differentiable everywhere on Yg except on a null-set SSg C We now claim that if the 
optimization problem 

(5.8) x(s) — argmax (s, y) 

yec 

has multiple solutions, then Tg is Gateaux-nondifferentialble at s. Since Tg is then also 
Frechet nondifferentiable at s, it must be the case that s G S8g. Let us thus suppose 
that (15. 8p has two different solutions, xq ^ x±. Then (s, x\ — xq) = 0, so that we have 
Ci := (s,x*x) = (s,x ). Furthermore, writing c 2 := (x\ — x ,x ) and c 3 := {x\ — x ,Xi), 
our assumption that xq 7^ x\ implies c 2 7^ C3. Without loss of generality we may assume 
that c 2 < c 3 . For all £ e M let us define := s + £(a?i — x ) and consider the restriction 
' r s|s+s P an(xi-x ) which we shall denote by r(£) := Tg(w^). Clearly, if r(£) is nondifferentiable 
at £ = 0, then Tg(w) is Gateaux-nondifferentiable at w = s. The definition of r(£) implies 
(T(£,)w£,Xj) < 1 for (j = 0, 1), so that 



r(£) < min 



1 1 



ci + c 2 £ ci + c 3 £ 



imin fl-% + ^ 2 ),l--e + 
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Furthermore, we have r(0) = l/c±. Therefore, 

d fn\ v T ^ ~ k <r C3 / C2 <r v T ® ~ t d <<r 
-— r(0) = lim < — 5- < — k < hm = — — r(0), 



showing that r(£) is nondifferentiable at £ = 0, as claimed. 

Next, observe that Tg is Frechet different iable at w G 'fg if and only if the map 

f : S^ 1 -> R, 

z i-> max {r > : G C°} 

is differentiable at w := and if and only if t& is differentiate at w. Denoting the 

spherical projections of Yg and by % and the compactness of S n_1 implies the ex- 
istence of finitely many points si, . . . , Sk G S™^ 1 such that V} k i=1 fg i = S n_1 . Consequently, 

is a null-set with the property that if Problem ( 15. 8 p has multiple solutions for a given 
s G S n ~ , then s G =5^. This proves the claim of the theorem in the case where C has 
nonempty interior. 

Let us now consider the general case. When C consists of a singleton, the claim of the 
theorem is trivial. We may thus assume that dim(C) > 1. Upon a shift we may assume 
without loss of generality that G C. Let W = span(C) be the subspace spanned by C, 
and W L its orthogonal complement under the Euclidean inner product of M. n . We denote 
the orthogonal projections onto these spaces by ttw and n w ± respectively. Finally, let 
S\y = S n_1 nW be the unit sphere in W, and 



7T S : S"- 1 -> S 



s h-> 



the rescaled projection of S n_1 into W. 

The condition dim(C) > 1 implies dim(W /± ) < n — 1, and £$ w i. = {u G Q : 5 , (w) G 
W 7-1 "} is a null-set. Hence, tts(s) is defined for almost all s G S n_1 . Further, by isotropy of 
the uniform distribution on S™ _1 , the random vector 

7T S {S) : Q\& w ± S w 
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is uniformly distributed on Sw- Since C has nonempty interior in the subspace topology 
of W, the case we already settled above applies and implies that 



$ w — ^ u E Q \ M w ± : argmax (n s (S(cu)), yj is nonunique j 



is a null-set. Observing that for s E S n 1 XW 1 - it is the case that 



argmax (s,y) = argmax {irs(S(uj)),y 

yeC yeC \ 

we find that argmax^ eC ;(S'(c<;), y) has a unique solution if and only if uj is not in the 
null-set 38 = 3§ w ± U 3S W . □ 

The following notion will play a key role in the sequel. 

Definition 5.1. Let C C lR n be convex compact. We say that a boundary point x E dC 
is a point of strict curvature if there exists s E S™ -1 such that the optimization problem 
maxjj e c(s, y) has x as unique maximizer. We denote the set of points of strict curvature 
by C SE - 

Note that if C has a differentiate boundary, then any point where all principal cur- 
vatures are nonzero is a point of strict curvature. However, the set of points of strict 
curvature may be larger. For example, the epigraph of the curve x i— > \x\ 3 has zero cur- 
vature at x — 0, but this is a point of strict curvature nonetheless under our definition. 
Furthermore, Definition 15. II also applies to points where dC is nondifferentiable and prin- 
cipal curvatures are not defined. For example, vertices of polytopes are points of strict 
curvature, while points on edges (1-faces) are not. Proposition 15.11 below also provides 
further intuition. 

For the purposes of the next result, let us recall that the normal cone of C at x E C is 
defined as follows, 

N ? C = {s6R": (s,x - w) > 0, E C} , 

or equivalently, 

N^C = {s E lR n : x = arg max(s, w) } 

(5.9) = \ts : r > 0, s E S n_1 , x = argmax(s, w}\. 

wee 
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By the dual description of C, it is the case that 

(5.10) NaCnS^nspantC) ^0 



if and only if x E dC, see also Lemma loTTl c) . We denote the set of extreme pointgfl of C 
hyC E . 

Proposition 5.1. For any C C M n nonempty convex compact, the following hold true: 

a) x E Cse if and only if there exists s E N^CflS n_1 and sequences (5k)ken, (efc)fceN C 
M + such that e^, 5k — > as k tends to infinity and 

{yEC: N ff C n S 71 - 1 n B Sk (s) ^ 0} C B ek (x) , VfcGN. 

b) C se EC e E c \(C S e)- 

c) {x E dC : NfCfl NjC = span(C) ± , \/ v E C \ {x}} C CW 

d) Let x E Cse and sq E N^ CflS" -1 be chosen such that xq is the unique maximizer 
of maXy- g c(s , y) , and let e > be given. Then there exist finitely many points 
x*i E C and normal vectors Sj E C fl S n_1 , (i = 1, . . . , k), such that 

k 

C(£o, •••,£*) :={xER n : (s , x - x ) > £ } H f| {x E R n : (4 £ - < &} 

i=l 

zs compact for all (£ , • • • , 6fe) K fc+1 , owd C(0, . . . , 0) C B e (x ). 

Proof, a) Let x E Cse, and let s E N^C fl S n_1 such that x is the unique maximizer of 
(5.11) max(s,y). 

2/6C 

Let (5fc)fc e N C M+ be a sequence such that 5^ — >■ 0. We claim that there exists a sequence 
(efc)fceN with the required properties. Supposing the claim to be wrong, there exists an 
e > and sequences (xk)keN, (sfc)fceN such that 



x k E C \ B e (x) 

slteN^cns^nB^ (s). 



in— 1 



3 A point x G C is an extreme point of C if it cannot be written as a convex combination of two points 
y, ze C\{x}. 
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Since 5k — > 0, we have Sk —> s, and C \ B e (x) being compact, we may assume without loss 
of generality that Xk — > x* for some x* G C \ B e (x). By (I5.9p . this shows that x is not 
the unique maximizer of ( 15.1 ip and contradicts the membership of x in Cse- Our claim 
is thus correct, and this establishes the "only if" part. 

To prove the converse, let us assume that (Sk)km and (ek)km with the required prop- 
erties exist. By (I5.9p . all maximizers x* of (15. lip must satisfy s G Ng, C, and since 
s G S n_1 fl Bg k (s), this implies x* G B £fc (x) for all fceN, which can only be true if x* = x. 
This shows that x is the unique maximizer of ( 15. lip , and hence x G C^g. 

b) For any point x G C \Ce there exist two other points if, w G C of which x is a 
convex combination, x = + {1 — £)w. Let s G F%C fl S n_1 , so that x is a maximizer 
of ( 15. lip . The existence of such an s is guaranteed by ( 15. 9p . By convexity, we have 
(s, x) < max((s, v), (s, w)). Without loss of generality, we may assume that the maximum 
is achieved at (s, w), and therefore, w is also a maximizer of (15. lip . Since this construction 
works for any s*G F%C fl S n_1 , this shows that x ^ Cse, and hence, Cse Q Ce- 

Next, we claim that K\ := cl(conv(C5£;)) = C . Assuming our claim to be wrong, there 
exists x G C \Ki, and since K\ C C is compact, the Hahn-Banach Separation Theorem 
then implies that there exists s G S™ -1 such that 

(5.12) r) := max (s,y) < £ := (s, x) . 

By Theorem 15.11 there exist sequences (sk)km C S n_1 and (yk)km C Cse such that 

-, k— >oo -, 

Sk — > s, 

y k = argmax (s,y) , 
yec 

and since K\ is compact, we may further assume without loss of generality that — > 
y* G K\. Using the continuity of the function sh-> max^ eC (s, y), we thus find that 

£ = lim (s k ,yk) = lim (s k ,y*) < r). 

k— >oo k— >oo 

This contradicts (15.121) and proves our claim to be true. 

Next, let x G Ce- By what we know so far, there exists a sequence of points (x^km C 
conv(C,s£) that converges to x. The Caratheodory-Steinitz Theorem [20] then implies 
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that each Xk can be written as a convex combination 

dim(C)+l 

Xk= 22 ®i*i,k 
i=\ 

of at most dim(C) + 1 points G Cse, (i = 1, . . . , dim(C) + 1). By compactness of 

A dim(c)+ i ® (®S (C)+1 cl (Cse, 

where A dim ( C -) +1 denotes the dim(C)-dimensional simplex, we may assume without loss of 
generality that 

t7 > V G A dim ( C ) +1 , 

Zi,k — > Zi,* G cI(Cse), 

so that 

dim(C)+l 

x = lim Xk = } G conv(cl(Cs£;)) C C = cl(conv(Cs£). 

1=1 

Seen as a? is an extreme point of C, it must also be an extreme point of the subset 
conv(cl(C,s£))- Hence, all must be identical. This shows that x = z\ t * G c\(Cse) and 
proves the inclusion C cl (Cse)- 

c) This follows directly from Equations ( 15. 9 p and ( I5.10p . 

d) By the dual description of C, we have 

P| p| {yeR n : (s,y-x)<0} = C, 

and since it suffices to take this intersection over a dense subset of x G C, we have 

(5.13) p P {yeR n : (s,y-x)<0} = C. 

xedc\{x } seN s ens™ -1 

By compactness of C there exists p > such that C C cl(B p (0)). Consider the compact 
set K 2 = cl(B p (0)) \ B £ (x ). By the assumed properties of x and s , it is further true 
that 

(5.14) {yeR n : (s , y - x ) > 0} n C = {x } • 
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Equations f!5.13p and ( 15. 14ft now show that 

K 2 C {y eR n : (so,y — xq) < 0} U (J \J {y G R n : (s,y - x) > 0} . 

x6C\{x } seN^cns™- 1 

By compactness of i^2, there exist finitely many Xi G C \ {xo} and S{ G C fl S"~ , 
(i — 1, . . . , k), such that 

k 

K 2 G{yeR n : (s ,y-x ) <0}u{J{yeR n : > 0} . 

i=i 

Let e} be the j-th coordinate vector in M. n . Let us write Sjj = (— l)Vj for £ G {0, 1}, and 
let 

Xj/ G argmax (sjj, w) ■ 

Then Sj^ G 4 C, and 

1 n 

Q:=f)f){yeR n : (s^y-x^) <0} 
e=oj=i 

is a cuboid. By choosing p large enough for Q C cl(B p (0)) to hold, and by including the 
(xj t £, Sj t e) among our list of points {(xi, s») : i = 1, . . . , k} if necessary, we can guarantee 
that 

k 

{yeM n : (s ,y-x )>0}nf]{yeR n : (s^y-x,) <0} cQ\K 2 cB e {x ), 

and that C(£o, ■ ■ ■ , 60 i s compact (although possibly empty) for all (£o, • ■ • , 60- D 

Proposition 5.2. Le£ C C M n 6e nonempty convex compact and C 1 ,^ 2 ,... compact 
subsets of ~R n such that d(C n , C) — > 0, where d denotes the Hausdorff distance. Let 
s G S n_1 be such that the optimization problem 

(5.15) x* = argmax (s,x) 

has a unique solution. For all n G M let x n be a solution of 

x n = arg max (s,x) . 
xec n 

Then x n — > if* as n tends to infinity. 
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Proof. Let f(x) = (s,x) be the linear functional defined by s. We note that 

liminf max f(y) > /(x*), 

n— >oo y£C n 

since for any e > there exists n e such that for all n > n e there exists y n eA n n B e (x*), 
and we have 



(5.16) 



/(£*) = f(Vn) + (s, - Vn) < f(Vn) + £. 



Next, let W = span(C) and W L be its orthogonal complement in W 1 . Upon shifting and 
rescaling, we may assume without loss of generality that B p (0) fl W C C C Bi(0) fl W 
for some p > 0, so that /(a?*) < 1. The condition d(C n , C) — > then implies that for any 
e > we may take n e to be large enough so that for n > n e , 

e 



c n c ( i + - )Cx (B e (o)nv^) . 



Convexity of C and the uniqueness of x* as a maximizer of (15. 15ft imply that there exists 
5(e) > such that 



(5.17) m<f&)- 

for all x G C \ Bj( £ )(x*), and that 5(e) — > when e — > 0. Furthermore, we have 

(5.18) sup f(y)<e. 

( ET7D and (I5TT81 imply that for all £ G (1 + e/p) (C \ B s{e) (x*)) x (B £ (0) n W 1 



/(f) < U + -1 (y(f*) 



In particular, this applies to all x G C n \ (1 + e/p) B^^x*), and (15.1 6[) shows that for 
n > n e we have x n G (1 + e/p) Bs( e ) (x* ) • We conclude that 
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f„ - f n 


1 


+ 






Pj 









<(l + l), (e)+ i. 

Since we may choose e, 5 — >• when n is allowed to go to infinity, the result follows. □ 
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Next, we shall investigate the approximability of compact convex sets by polyhedra and 
polytopes. Results on outer approximations by polyhedra and algorithms to achieve this 
in practice are widespread in the literature on the cutting plane approach in numerical 
optimization, see e.g. Bertsekas [3]. Similar results for inner approximations by polytopes 
play a key role in Markov chain Monte Carlo methods for the estimation of the volume 
of high dimensional convex bodies, see e.g. Jerrum [TT]. The literature in both areas is 
focused on algorithms and relies on separation or membership oracles. As a result, the 
constructions use outer approximations by cutting planes that do not necessarily touch 
the boundary of the convex body to be approximated, and likewise, inner approximations 
use generators that generally do not lie on the boundary either. 

In contrast, the approximations required by our analysis have a crucial interplay with 
the boundary. For outer approximations, we would like cutting hyperplanes to be sup- 
ported at points of strict curvature. Likewise, we would like inner approximations to be 
generated as the convex hull of points of strict curvature. Since we are not aware of such 
resuls appearing in the literature, we derive them from first principles. 

Lemma 5.1. Let C C W 1 be a convex set with dual description 



sGS™" 1 

where Hg = {x : (s, x) < A (s)} for some continuous function s h-> X(s) G K. Then the 
following hold true: 

a) C is compact. 

b) For any given e > 0, there exists a finite collection of points si, . . . , G S n_1 for 
which 



(5.19) 




(5.20) 



max 




For every point x G dC , there exists s G S 



71— 1 



such that 



( 



s, x) = max (s,y) = A (s) . 



yeC 
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Proof, a) Since C is a closed subset of the compact set 

n 

1=1 

it is itself compact. 

b) The continuity of A and of the Hausdorff distance imply that the function s t— > 
d (x, H(s)) for any fixed x G MJ 1 is continuous. The dual description (I5.19P can therefore 
be amended by taking the intersection over s in a dense subset S C S n_1 only, and since 
the unit sphere is a separable space, we may take S to be a countable set {s^ : i G N}. 
Thus, we have 

oo 

(5.21) C = f]H(s t ). 

i=l 

Consider the nested sets 

(5.22) Gj := nf =1 if(si)- 

Since C is compact, we may furthermore order the vectors Si so that Gj is compact for 
all j > 2n. Our claim is clearly true if we can establish that 

(5.23) maxd(x,C) j ^0. 

xeGj 

Assuming the contrary, there exists e > 0, a subsequence (G^)^ and points yj. G Gj. 
such that 

(5.24) d(y ji ,C)>e, V^GN. 

Since all but finitely many terms of the sequence (^JjgN are contained in the compact set 
G2n, there exists a convergent subsequence (zfc)fceN ^ (%'JjeN with limit z*. By continuity 
of the distance function and by virtue of Equation (I5.24p . we have 

d(z*,C) = lim d(z k ,C) > e. 

fc— >oo 

Since this is in direct contradiction with z* G flieN^ = i m phed by ( I5.2ip . our claim 
is true. 



c) If C — 0, there is nothing to prove. Hence, we may assume that C is nonempty. 
Without loss of generality, we may furthermore assume that C is shifted so that it contains 
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the origin and A(s) > for all s G S™ -1 . Let x G dC. Then, by the dual description 
dEl9D, 

(s, x) < max (s, y) < A (s) 
yec 

for all s G S n ~ . Thus, it suffices to prove the existence of s such that (s,x) > A(s). 
If x — 0, then we may assume without loss of generality that C has empty interior, as 
any interior point could otherwise be shifted to the origin. By convexity, C then lies 
in a lower- dimensional subspace, and it suffices to take s G span(C)- L . If x 0, then 
(1 + l/k)x (fi C for all k G N. By virtue of the Hahn-Banach Theorem there exist unit 
vectors Sk G S™" 1 such that 

(5.25) (? t ,50> A '"' J 



1 + 1 



S n 1 being compact, we can extract a convergent subsequence and assume without loss 
of generality that Sk — > G S n_1 , as k — > oo. By continuity of A, ( 15.251) implies 

(s*, ^) > A (s*) . 

□ 

Theorem 5.2. Lei C fre as in Lemma \5. 1\ and nonempty. Then the points Si in part b) 
of Lemma \5.1\ can be chosen so that X{ = argmax^ g c(sj, y) is unique for all i, that is, Xi 
are points of strict curvature. 



Proof. By Theorem 15.11 the set 

S = < s G S n_1 : argmax (s, y) G Cse 



has full measure and is therefore everywhere dense in S n ~ . Since cl(S) = S n ~ is separa- 
ble, there exists furthermore a countable subset : i G N} C S, also everywhere dense 
in S n_1 , that can be used in the construction. □ 

Theorem 5.3. Let C C W 1 be nonempty compact convex. The for all e > there exist 
finitely many points of strict curvature x\,...,Xk G Cse such that 

maxfi (x, conv (xi, . . . , Xk)) < e. 
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Proof. Let {xi, . . . , Xk) C Cse be an e-net on the set Ce of extreme points of C, that is, 
x^ are chosen so that 

min — Xjll < e 

i 

for all z6 C#. The existence of such an e-net is established as follows: C being compact, 
c\(Cse) is a compact set too, and by the Heine-Borel Theorem, we can extract a finite 
covering by Euclidean balls of radius e/2 around points y,i G c\(Cse), which by Proposition 
15. II Vj) is also a covering of Ce, 



\jB €/2 (yi)DC E . 



i=i 



Next, for all i choose Xj G Cse within distance e/2 of It then follows from the triangular 
inequality that {xi, . . . , x^} is the required e-net. 

By the Theorems of Minkowski [lTj and Caratheodory§ |">J . any point x G C can be 
written as a convex combination x = ^iZ\ + • ■ ■ + ^ m i* m of m < n + 1 extreme points 
Zj G Ce, and by construction of the e-net, it is then possible to choose 1 < ij < k such 
that \\zj — Xi.\\ < e for all j. Using the triangular inequality once again, we find that 

d (x , conv (xi, . . . , x k )) < d (x, conv (x h , . . . , x im )) 

< d (Cl^l H h imZ m , Cl^ii H 1" Cm^i m ) < 6, 

as claimed. □ 

The next result can be seen as an intuitive template for Theorem 12.11 free of large 
deviations complications. 

Theorem 5.4. Let C be a nonempty convex compact subset of M. n with dual descrip- 
tion (15.11) . and let C^C 2 ,... be convex compact subsets of M. n such that for all linear 
functionals f : M™ — > K it is true that 

(5.26) max / (p) 1zt3? max / (p) . 

pec n pec 



4 This theorem says that a convex compact set in R" is equal to the convex hull of its extreme points. 
The generalization of this result to arbitrary topological vector spaces is the Krein-Milman Theorem 

P3EEI5]. 

5 This result says that if K — conv(X) for some X C R™, then every point in K is a convex combination 
of at most n + 1 elements of X. 
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Then d(C n ,C) -> 0. 

Proof. Upon translation and rescaling we may assume without loss of generality that 
{0} G C C Bi(0). Let e > be a given small number. By the dual description of C and 
Lemma [5TTI there exist finitely many vectors si, . . . , Sk G S™^ 1 such that 

(5.27) sup d (x, C) < e, 

where C t = n^ =1 {x : (si,x) < \ Sl } and A s = argmax^ gC 7 (s, x) > 0. Because of Assump- 
tion (15.26p . there exists uq G N such that for all n > no, 

max(s i ,f) < (l + e)Xg v (i = 1, . . . , k), 

and hence, C n C (1 + e)C e . Therefore, for all n > no, 

sup C) < sup C) 

= (1 + e) sup d ( x, C 

g eCe V 1 + e 

< (1 + e) ( sup d (x, C) + sup ti ( x, C 

(5.28) <2(l + e)e 

where we used (15.271) and d{C, 1/(1 + e)C e ) < e to arrive at the last inequality. 

Next, let us choose points x\, . . . , Xk as in Theorem 15.31 Proposition 15.11 allows us to 
choose scoring functions and Si, . . . , Sg such that Xi G C, C B e (xi) and C\ is compact 
for all j 6 N, where 



Q := \x : / 5jf . (£) > max f Sgm (x) , f Sj (x) < max f s . (x) , (j = 1, . . . , £) 
C? : = i # : fs s . (x) > max f s (x) , f s (x) < max f s (x) ,(j = !,...,£) 



X&& 

Taking Xy to be a maximizer of argmax^ gCJ we find G C J ' C\C\ ^ 0, and by (I5.26P 
applied to f$ s . and fsj (j = 1, . . . ,k) and the continuity of the Hausdorff distance, we 
have 

limsup \\xi — Xij\\ < e. 
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By the triangular inequality, this implies 
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limsupmaxcZ(:r, C n ) < 2e, Ve > 0, 
and since this is true for all e rational, the Theorem follows. □ 

The final result shows among other things that all results presented above are all ap- 
plicable to SET, defined as in fll.lOp . Recall the notations X(S) for the Chvatal-Sankoff 
limit (JTTTJ) of a scoring function S, and fs, the linear functional associated with S as 
defined in (11. ip . 

Lemma 5.2. 

a) The function S t— > X(S) is continuous. 

b) SET is nonempty convex compact. 

c) For every x G dSET there exists Sg =fi such that {y : fs g (y) = X(Sg)} is a 
tangent plane to SET supported at x. 

d) m&x^sET fs (^) = X(S) holds true for all scoring functions S. 

Proof, a) It suffices to show that for any two scoring functions S and T the following 
inequality holds true, 

X n (S)-X n (T)<2\\T-S\\ 00 . 
For any pair of letters c, d G A* we have 

\S(c,*)-T(c,*)\ < WS-TW^ = max \S(a, b) - T(a, b)| . 

n,be,4* 

Further, since no optimal alignment of two strings of length n contains any aligned pair 
of gaps, there are at most 2n aligned letter pairs, so that the triangular inequality implies 

\L n (S)-L n (T)\<2n\\S-T\\ 00 . 

The claim now follows by dividing by n and taking expectations. Lemma 15.11 now applies 
because SET has the dual description fll.lOp . where S can be restricted to the case where 
fs is a unit vector, since for all r > and pairs of strings (x,y), we have (rS) w (x,y) = 
rS n (x,y), whence X(rS) = tX(S) and H(rS) = H(S). 



38 



R.A. HAUSER AND H.F. MATZINGER 



b) By Part a), Lemma [5.11 is applicable to C — SET. This shows that SET is convex 
compact. It remains to prove that it contains at least one point. Consider the random 



sequences X, Y introduced in Section and for all n G N let the first n letters be 
aligned by 7i n , defined as follows, 



X 


x x 


e 


x 2 


& 


x 3 




x n 




Y 


(5 




(3 


Y 2 


(5 




6 


Y 

1 n 



where & denotes a gap. For all n G N, the expected empirical distribution of aligned 
letters is given by the vector x with entries defined as follows, 



P[*i = a] 

P[^i = b] 




if a ^ 0, b = (5, 
if a = 0, b ^ &, 
otherwise. 



For every scoring function S this is a legitimate, albeit suboptimal, alignment. Therefore, 
we have 

1 El i Jus) 

f s (x) = -E[S 7Tn (X,Y)} < -E[L n (S)} < X(S). 

n n 

By frTTO]) . this shows that x e SET. 



c) This follows from Lemma [5. 11 which is applicable by Part a). 



d) Note that it follows from (TOD and f TiTU]) that f(x) < X(S) for all x. It suffices thus 
to show that the hyperplane 



T:={x : f s (x) = X(S)} 



has nonempty intersection with SET. Assuming the contrary, and using the construction 
of f)5.2ip and f )5.22p . there exists j G N such that T CiGj = 0. By continuity, there then 
exists 5 > such that 



(5.29) 



T s n G* = 0, 
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where 

-s 



T d := {£: f s (x)>X(S)-5} 



.1 



GfJ := pl^ff* (Si 



i=i 



H d (S t ) :={x: / Sl (x)<A(^)+5}. 

By (I1.14p and the almost sure convergence of L n (Si)/n to A(S , i ), there exists almost surely 
n G N such that for all n > no, 

jnax^ f Si (x) < X (Si) + 5, (i = l,...,j), 

so that SET n C Gj, and further, 

max fs (x) > X(S) — 5, 

so that ^ T 5 n SET n C T 5 (1 G?J. Since this is in direct contradiction with f[5T29j) . the 
claim holds true. □ 



6. Appendix: Large Deviations 

Recall the notations . . . Xi, y\ . . . yj), L n (S) = L S (X 1 . . . X n , Y 1 . . . Y n ) and X n (S) = 

E[L n (S)]/n introduced in Section ILI] and the fact that X n (S) — > X(S) mentioned in (II. 7p . 
In this appendix we will show a stronger result that quantifies the convergence rate as 
being of order ff(^\nn/n). For this purpose, we introduce the following notation, 



\\S\\ 5 = max \S(c,Q) - S (c, e 



Halloo = max \S (c,0) 



Lemma 6.1. Let x — x±. . . x m and y — yi . . . y n be two given strings with letters from the 
alphabet A, and let S be a given scoring function. Let further x G A, and consider two 
amendments of string x, = X\ . . . . . . x m , obtained by replacing an arbitrary 

letter X, by x, and x' + ' = X\ . . .x m x, obtained by extending x by a letter x. Then the 
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following hold true, 

(6.1) \L 3 (x®,y)-L s (x,y)\ < \\S\\ 5 , 

(6.2) \L s (x^,y)-L s (x,y)\ < \\S\U 

Proof. Let 7r be an optimal alignment of x and y, so that S v (x, y) = L${x, y), and denote 
the letter with which x, is aligned under ir by a G A*. Then 

L s (x®,y) > S„(x®,y) = S % (x,y) - S( Xi ,a) + S(x, a) > L 3 (x,y) - \\S\\ 5 . 

Applying the identical argument to an optimal alignment of and y, we obtain the 
analogous inequality 

L s (x,y)>L s (x®,y)-\\S\\ s , 

so that (16. ip follows. 

For the second claim, let us use an optimal alignment ir of x and y to construct an 
alignment 7rW of and y by appending an aligned pair of letters (x,<5), where & 
denotes a gap. Then we have 

L s {x [+ \y) > S vl+] {xW,y) = S n (x, y) + S(x, 0) > L s (x,y) - \\S\U 

Conversely, we can amend an optimal alignment 7rM of anc [ y to become a valid 
alignment 7r of x and y by cropping the last pair of aligned letters, (x, a). We then have 

Ls(x,y) > S#(x,y) = S^i+](x w ,y) - S(x, a) > L s (x [+] ,y) - WSW^, 

thus establishing f 16 . 2 j) . □ 

Lemma 6.2. T7ie convergence of X n (S) to X(S) is governed by the inequality 



(6.3) x n (S)<X(S)<X n (S) + C JS\\s^ + ^^, WneN, 

where 



'21n3 + 21n(n + 2) 



ln(n) 

Note that c n tends to v2 when n — >■ oo, so that it effectively acts as a constant. 



Proof. See 0. 



□ 
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Lemma 6.3 (McDiarmid's Inequality [15J). Let Z±, Z\, . . . , Z m be i.i.d. random variables 
that take values in a set D, and let g : D m — > R be a function of m variables with the 
property that 



max sup \g(z 1 , . . . , z m ) - g(zi 



i=l,...,m 



5***5 "%t * * * 5 



z m )\ < C. 



'.eD m ,ZieD 



Thus, changing a single argument of g changes its image by less than a constant C . Then 
the following bounds hold, 

2e 2 m 



P \g{Z x , ...,Z m )- E[( ? (Z 1 , . . . , Z m )\ > e x m] < exp | 
P [E [g(Zi, Z m )\ - g(Zi, Z m ) > e x m] < exp 



C 2 
2e 2 m 



C 2 



Proof. A consequence of the Azuma-Hoeffding Inequality, see [15] . 



□ 



Theorem 6.1. For fixed e > and scoring function S there exists K > and n t G N 
such that 

~L n {S) 



(6.4) 
(6.5) 
(6.6) 



n 

L n (S) 
n 

LnjS) 

n 



> X(S) + e 
< \n(S) - e 
< X(S)-e 



< e 



— Kn 



Vn G N, 



<e~ Kn , VfiGN, 



<e~ Kn , Vn»n £ 



Proof. We know from Lemma [6.11 that 

g(Xi, . . . , X n , Yi, . . . , Y n ) = S[Xi . . . X n , Y\. . . Y n ) = L n (S) 

satisfies the assumptions of Lemma 16.31 with m = 2n and C = \\S\\s- McDiarmid's 
Inequality therefore shows 
~L n (S) 



n 



>K(S) + e 



L n (S)>E[L n {S)} + -x2n 



(6.7) 

and similarly, 
(6.8) 



< 



exp 



ll^lll 



x n 



L n (S) 



n 



< Xn(S) - e 



< exp 



I Q II 2 



x n 



42 



Claim (16. 5p therefore holds with K 
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-iiisiii 



Furthermore, Lemma [6.21 established that 
(6.9) 



X n (S) < X(S) < X n (S) + c n \\S\\ s ^ + M 

n n 



Vrz 6 N, 



holds, where c n = a/2 In 3 + 2 ln(n + 2)/ \J\n{n). Using the first inequality from (16. 9p in 
conjunction with (16. 7p . we find 



L n (S) 
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> X(S) + e 



< P 



L n (S) 



n 



which shows that Claim (16.41) holds with K 



>X n (S) + e 



4iisnr 



< exp 



I QII2 
l D ll5 



x n 



Using now the second inequality from (I6.9P in conjunction with (16. 8p . we find 

'L n (S) 



n 



< X(S) - e 



< P 



Ln{S) , ( „ „ VhT7i 2||S'|| CV 

< A„,(6) - e - CnlpUs- 



n 



< exp 



e - c„||S'L 



ATTI _ 2||5||, 
^/n n 



I CII2 



7? 



x n 



< 



exp 



4||^||3 



x n 



where n e G N is chosen large enough to satisfy 



e - c n \\S\\ s 



Vh^ 2\\S\\ C 



> —, Vn > n f . 



n 



n 



This shows that (16. 6p holds for K 



□ 
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